AITopics

Country:

North America > Canada (0.46)
North America > United States (0.28)

Genre: Research Report (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Government (0.93)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Neural Information Processing SystemsFeb-11-2026, 14:51:57 GMT

In Differential Privacy, There is Truth: On Vote Leakage in Ensemble Private Learning Jiaqi Wang

When learning from sensitive data, care must be taken to ensure that training algorithms address privacy concerns. The canonical Private Aggregation of Teacher Ensembles, or P A TE, computes output labels by aggregating the predictions of a (possibly distributed) collection of teacher models via a voting mechanism. The mechanism adds noise to attain a differential privacy guarantee with respect to the teachers' training data. In this work, we observe that this use of noise, which makes P A TE predictions stochastic, enables new forms of leakage of sensitive information. For a given input, our adversary exploits this stochasticity to extract high-fidelity histograms of the votes submitted by the underlying teachers. From these histograms, the adversary can learn sensitive attributes of the input such as race, gender, or age. Although this attack does not directly violate the differential privacy guarantee, it clearly violates privacy norms and expectations, and would not be possible at all without the noise inserted to obtain differential privacy. In fact, counter-intuitively, the attack becomes easier as we add more noise to provide stronger differential privacy. We hope this encourages future work to consider privacy holistically rather than treat differential privacy as a panacea.

artificial intelligence, histogram, machine learning, (17 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Neural Information Processing SystemsDec-25-2025, 02:53:00 GMT

In Differential Privacy, There is Truth: on Vote-Histogram Leakage in Ensemble Private Learning

When learning from sensitive data, care must be taken to ensure that training algorithms address privacy concerns. The canonical Private Aggregation of Teacher Ensembles, or PATE, computes output labels by aggregating the predictions of a (possibly distributed) collection of teacher models via a voting mechanism. The mechanism adds noise to attain a differential privacy guarantee with respect to the teachers' training data. In this work, we observe that this use of noise, which makes PATE predictions stochastic, enables new forms of leakage of sensitive information. For a given input, our adversary exploits this stochasticity to extract high-fidelity histograms of the votes submitted by the underlying teachers. From these histograms, the adversary can learn sensitive attributes of the input such as race, gender, or age. Although this attack does not directly violate the differential privacy guarantee, it clearly violates privacy norms and expectations, and would not be possible $\textit{at all}$ without the noise inserted to obtain differential privacy. In fact, counter-intuitively, the attack $\textbf{becomes easier as we add more noise}$ to provide stronger differential privacy. We hope this encourages future work to consider privacy holistically rather than treat differential privacy as a panacea.

differential privacy, ensemble private learning, vote-histogram leakage, (6 more...)

Industry: Information Technology > Security & Privacy (0.60)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.77)
Information Technology > Security & Privacy (0.60)

Sarmasarkar, Sahasrajit, Jiang, Zhihao, Goel, Ashish, Korolova, Aleksandra, Munagala, Kamesh

Multi-Selection for Recommendation Systems

arXiv.org Artificial IntelligenceApr-11-2025

However, these practices can lead to significant privacy risks, including data exploitation Barocas and Nissenbaum [2014], re-identification threats Narayanan and Shmatikov [2008], and surveillance concerns Lyon [2014]. To address these issues, several privacy-preserving techniques have been proposed, including differential privacy McSherry and Mironov [2009], federated learning Ammad-Ud-Din et al. [2019], homomorphic encryption Kim et al. [2016], privacy-preserving matrix factorization Hua and Xiong [2015], and K-anonymity Polat and Du [2005]. Despite their potential, these methods often face challenges such as reduced utility, computational complexity, and communication overhead. In this work, we explore a privacy-preserving recommendation system where user queries are protected using differential privacy within the local trust model Bebensee [2019], with a focus on balancing the trade-offs between utility and privacy. In the local trust model, user queries and user features are changed from the original to preserve privacy (typically by adding noise), which can lead to less accurate results from the server. To mitigate this issue, Goel et al. [2024] introduced the concept of multi-selection, where the server returns multiple results, allowing the user to select the most relevant one without disclosing its Supported by NSF awards CCF-2113798 and IIS-2402823. 1 arXiv:2504.07403v1

artificial intelligence, data mining, machine learning, (20 more...)

2504.07403

Country: North America > United States (0.68)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Data Science > Data Mining > Big Data (0.89)

Neural Information Processing SystemsJan-18-2025, 16:56:58 GMT

In Differential Privacy, There is Truth: on Vote-Histogram Leakage in Ensemble Private Learning

When learning from sensitive data, care must be taken to ensure that training algorithms address privacy concerns. The canonical Private Aggregation of Teacher Ensembles, or PATE, computes output labels by aggregating the predictions of a (possibly distributed) collection of teacher models via a voting mechanism. The mechanism adds noise to attain a differential privacy guarantee with respect to the teachers' training data. In this work, we observe that this use of noise, which makes PATE predictions stochastic, enables new forms of leakage of sensitive information. For a given input, our adversary exploits this stochasticity to extract high-fidelity histograms of the votes submitted by the underlying teachers.

differential privacy, ensemble private learning, vote-histogram leakage, (3 more...)

Industry: Information Technology > Security & Privacy (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.81)
Information Technology > Security & Privacy (0.63)

Rokvic, Ljubomir, Danassis, Panayiotis, Faltings, Boi

A Practical Influence Approximation for Privacy-Preserving Data Filtering in Federated Learning

arXiv.org Artificial IntelligenceJan-25-2023

Federated Learning by nature is susceptible to low-quality, corrupted, or even malicious data that can severely degrade the quality of the learned model. Traditional techniques for data valuation cannot be applied as the data is never revealed. We present a novel technique for filtering, and scoring data based on a practical influence approximation (`lazy' influence) that can be implemented in a privacy-preserving manner. Each participant uses his own data to evaluate the influence of another participant's batch, and reports to the center an obfuscated score using differential privacy. Our technique allows for highly effective filtering of corrupted data in a variety of applications. Importantly, we show that most of the corrupted data can be filtered out (recall of $>90\%$, and even up to $100\%$), even under really strong privacy guarantees ($\varepsilon \leq 1$).

artificial intelligence, data mining, machine learning, (19 more...)

2205.11518

Country:

North America > United States > Virginia (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)
Europe > France (0.04)
Europe > Italy > Veneto > Venice (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)

arXiv.org Artificial IntelligenceSep-21-2022

In Differential Privacy, There is Truth: On Vote Leakage in Ensemble Private Learning

Wang, Jiaqi, Schuster, Roei, Shumailov, Ilia, Lie, David, Papernot, Nicolas

When learning from sensitive data, care must be taken to ensure that training algorithms address privacy concerns. The canonical Private Aggregation of Teacher Ensembles, or PATE, computes output labels by aggregating the predictions of a (possibly distributed) collection of teacher models via a voting mechanism. The mechanism adds noise to attain a differential privacy guarantee with respect to the teachers' training data. In this work, we observe that this use of noise, which makes PATE predictions stochastic, enables new forms of leakage of sensitive information. For a given input, our adversary exploits this stochasticity to extract high-fidelity histograms of the votes submitted by the underlying teachers. From these histograms, the adversary can learn sensitive attributes of the input such as race, gender, or age. Although this attack does not directly violate the differential privacy guarantee, it clearly violates privacy norms and expectations, and would not be possible at all without the noise inserted to obtain differential privacy. In fact, counter-intuitively, the attack becomes easier as we add more noise to provide stronger differential privacy. We hope this encourages future work to consider privacy holistically rather than treat differential privacy as a panacea.

artificial intelligence, histogram, machine learning, (17 more...)

2209.10732

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.50)

Industry:

Information Technology > Security & Privacy (1.00)
Government (0.93)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

arXiv.org Artificial IntelligenceJul-15-2022

Sotto Voce: Federated Speech Recognition with Differential Privacy Guarantees

Shoemate, Michael, Jett, Kevin, Cowan, Ethan, Colbath, Sean, Honaker, James, Muthukumar, Prasanna

Speech data is expensive to collect, and incredibly sensitive to its sources. It is often the case that organizations independently collect small datasets for their own use, but often these are not performant for the demands of machine learning. Organizations could pool these datasets together and jointly build a strong ASR system; sharing data in the clear, however, comes with tremendous risk, in terms of intellectual property loss as well as loss of privacy of the individuals who exist in the dataset. In this paper, we offer a potential solution for learning an ML model across multiple organizations where we can provide mathematical guarantees limiting privacy loss. We use a Federated Learning approach built on a strong foundation of Differential Privacy techniques. We apply these to a senone classification prototype and demonstrate that the model improves with the addition of private data while still respecting privacy.

dataset, gradient, privacy, (13 more...)

2207.07816

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.96)

Ryffel, Théo, Bach, Francis, Pointcheval, David

Differential Privacy Guarantees for Stochastic Gradient Langevin Dynamics

arXiv.org Machine LearningFeb-5-2022

We analyse the privacy leakage of noisy stochastic gradient descent by modeling R\'enyi divergence dynamics with Langevin diffusions. Inspired by recent work on non-stochastic algorithms, we derive similar desirable properties in the stochastic setting. In particular, we prove that the privacy loss converges exponentially fast for smooth and strongly convex objectives under constant step size, which is a significant improvement over previous DP-SGD analyses. We also extend our analysis to arbitrary sequences of varying step sizes and derive new utility bounds. Last, we propose an implementation and our experiments show the practical utility of our approach compared to classical DP-SGD libraries.

differential privacy, differential privacy guarantee, privacy, (13 more...)

arXiv.org Machine Learning

2201.1198

Country:

Europe > France > Île-de-France > Paris > Paris (0.04)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report (0.84)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.97)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.93)

Tople, Shruti, Sharma, Amit, Nori, Aditya

Alleviating Privacy Attacks via Causal Learning

arXiv.org Machine LearningSep-27-2019

Machine learning models, especially deep neural networks have been shown to reveal membership information of inputs in the training data. Such membership inference attacks are a serious privacy concern, for example, patients providing medical records to build a model that detects HIV would not want their identity to be leaked. Further, we show that the attack accuracy amplifies when the model is used to predict samples that come from a different distribution than the training set, which is often the case in real world applications. Therefore, we propose the use of causal learning approaches where a model learns the causal relationship between the input features and the outcome. Causal models are known to be invariant to the training distribution and hence generalize well to shifts between samples from the same distribution and across different distributions. First, we prove that models learned using causal structure provide stronger differential privacy guarantees than associational models under reasonable assumptions. Next, we show that causal models trained on sufficiently large samples are robust to membership inference attacks across different distributions of datasets and those trained on smaller sample sizes always have lower attack accuracy than corresponding associational models. Finally, we confirm our theoretical claims with experimental evaluation on $4$ datasets with moderately complex Bayesian networks. We observe that neural network-based associational models exhibit up to 80% attack accuracy under different test distributions and sample sizes whereas causal models exhibit attack accuracy close to a random guess. Our results confirm the value of the generalizability of causal models in reducing susceptibility to privacy attacks.

accuracy, causal model, dataset, (15 more...)

arXiv.org Machine Learning

1909.12732

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.68)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.86)
Health & Medicine > Therapeutic Area > Immunology > HIV (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)